[Minor] add non topk benchmarks for utf8/utf8view string aggregates#21073
[Minor] add non topk benchmarks for utf8/utf8view string aggregates#21073buraksenn wants to merge 4 commits intoapache:mainfrom
Conversation
|
Thanks @kosiew for the detailed review |
| assert_eq!(batch.num_rows(), LIMIT); | ||
|
|
||
| Ok(()) | ||
| Ok(format!("{}", pretty_format_batches(&batches)?)) |
There was a problem hiding this comment.
It looks like aggregate_string now pretty prints the collected batches and returns a String, and that is what b.iter(...) is measuring for each string benchmark.
This means the non TopK string benchmarks now include batch formatting and heap allocation overhead, which the numeric benchmarks do not include. So the comparison is no longer isolating query execution.
Would it make sense to keep a Result<()> fast path for the timed benchmark, and move the pretty_format_batches work into a one time validation helper that runs before benchmark registration?
There was a problem hiding this comment.
I've changed it such that it is done in the caller path and function returns vector
| .as_str(), | ||
| |b| b.iter(|| run_string(&rt, ctx.clone(), limit, true)), | ||
| ); | ||
| for asc in [false, true] { |
There was a problem hiding this comment.
Small readability thought here. Could the Utf8 vs Utf8View parity check live in a helper?
Right now criterion_benchmark is doing both benchmark registration and cross layout validation. Pulling the verification loop into something like assert_string_results_match(...) would make the benchmark matrix easier to scan.
There was a problem hiding this comment.
I've extracted two helpers instead of this
| ); | ||
| // String aggregate benchmarks | ||
| // (asc, use_topk, use_view) | ||
| let string_cases: &[(bool, bool, bool)] = &[ |
There was a problem hiding this comment.
The &[(bool, bool, bool)] tuple list is a bit hard to read and reason about when scanning or extending cases.
Maybe a small case struct or a helper similar to numeric_cases would help make the intent clearer. Something that names asc, use_topk, and use_view explicitly would also reduce the chance of mixing up tuple positions later.
There was a problem hiding this comment.
I've made this and numeric one both structs
|
Thanks for the detailed review twice @kosiew, really appreciate it :) |
Which issue does this PR close?
Rationale for this change
Details are in #19713 but main idea is to compare non-topk and topk test results so that we can compare performances
What changes are included in this PR?
Added non topk benchmark tests.
Are these changes tested?
Only test changes
Are there any user-facing changes?
No